157 research outputs found

    Representation Learning for Visual Data

    Full text link
    Cette thèse par article contribue au domaine de l’apprentissage de représentations profondes, et plus précisément celui des modèles génératifs profonds, par l’entremise de travaux sur les machines de Boltzmann restreintes, les modèles génératifs adversariels ainsi que le pastiche automatique. Le premier article s’intéresse au problème de l’estimation du gradient de la phase négative des machines de Boltzmann par l’échantillonnage d’une réalisation physique du modèle. Nous présentons une évaluation empirique de l’impact sur la performance, mesurée par log-vraisemblance négative, de diverses contraintes associées à l’implémentation physique de machines de Boltzmann restreintes (RBMs), soit le bruit sur les paramètres, l’amplitude limitée des paramètres et une connectivité limitée. Le second article s’attaque au problème de l’inférence dans les modèles génératifs adversariels (GANs). Nous proposons une extension du modèle appelée inférence adversativement apprise (ALI) qui a la particularité d’apprendre jointement l’inférence et la génération à partir d’un principe adversariel. Nous montrons que la représentation apprise par le modèle est utile à la résolution de tâches auxiliaires comme l’apprentissage semi-supervisé en obtenant une performance comparable à l’état de l’art pour les ensembles de données SVHN et CIFAR10. Finalement, le troisième article propose une approche simple et peu coûteuse pour entraîner un réseau unique de pastiche automatique à imiter plusieurs styles artistiques. Nous présentons un mécanisme de conditionnement, appelé normalisation conditionnelle par instance, qui permet au réseau d’imiter plusieurs styles en parallèle via l’apprentissage d’un ensemble de paramètres de normalisation unique à chaque style. Ce mécanisme s’avère très efficace en pratique et a inspiré plusieurs travaux subséquents qui ont appliqué l’idée à des problèmes au-delà du domaine du pastiche automatique.This thesis by articles contributes to the field of deep learning, and more specifically the subfield of deep generative modeling, through work on restricted Boltzmann machines, generative adversarial networks and style transfer networks. The first article examines the idea of tackling the problem of estimating the negative phase gradients in Boltzmann machines by sampling from a physical implementation of the model. We provide an empirical evaluation of the impact of various constraints associated with physical implementations of restricted Boltzmann machines (RBMs), namely noisy parameters, finite parameter amplitude and restricted connectivity patterns, on their performance as measured by negative log-likelihood through software simulation. The second article tackles the inference problem in generative adversarial networks (GANs). It proposes a simple and straightforward extension to the GAN framework, named adversarially learned inference (ALI), which allows inference to be learned jointly with generation in a fully-adversarial framework. We show that the learned representation is useful for auxiliary tasks such as semi-supervised learning by obtaining a performance competitive with the then-state-of-the-art on the SVHN and CIFAR10 semi-supervised learning tasks. Finally, the third article proposes a simple and scalable technique to train a single feedforward style transfer network to model multiple styles. It introduces a conditioning mechanism named conditional instance normalization which allows the network to capture multiple styles in parallel by learning a different set of instance normalization parameters for each style. This mechanism is shown to be very efficient and effective in practice, and has inspired multiple efforts to adapt the idea to problems outside of the artistic style transfer domain

    On the Challenges of Physical Implementations of RBMs

    Full text link
    Restricted Boltzmann machines (RBMs) are powerful machine learning models, but learning and some kinds of inference in the model require sampling-based approximations, which, in classical digital computers, are implemented using expensive MCMC. Physical computation offers the opportunity to reduce the cost of sampling by building physical systems whose natural dynamics correspond to drawing samples from the desired RBM distribution. Such a system avoids the burn-in and mixing cost of a Markov chain. However, hardware implementations of this variety usually entail limitations such as low-precision and limited range of the parameters and restrictions on the size and topology of the RBM. We conduct software simulations to determine how harmful each of these restrictions is. Our simulations are designed to reproduce aspects of the D-Wave quantum computer, but the issues we investigate arise in most forms of physical computation

    Exploring the structure of a real-time, arbitrary neural artistic stylization network

    Full text link
    In this paper, we present a method which combines the flexibility of the neural algorithm of artistic style with the speed of fast style transfer networks to allow real-time stylization using any content/style image pair. We build upon recent work leveraging conditional instance normalization for multi-style transfer networks by learning to predict the conditional instance normalization parameters directly from a style image. The model is successfully trained on a corpus of roughly 80,000 paintings and is able to generalize to paintings previously unobserved. We demonstrate that the learned embedding space is smooth and contains a rich structure and organizes semantic information associated with paintings in an entirely unsupervised manner.Comment: Accepted as an oral presentation at British Machine Vision Conference (BMVC) 201

    Learning Visual Reasoning Without Strong Priors

    Full text link
    Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.Comment: Full AAAI 2018 paper is at arXiv:1709.07871. Presented at ICML 2017's Machine Learning in Speech and Language Processing Workshop. Code is at http://github.com/ethanjperez/fil

    FiLM: Visual Reasoning with a General Conditioning Layer

    Full text link
    We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.Comment: AAAI 2018. Code available at http://github.com/ethanjperez/film . Extends arXiv:1707.0301
    • …
    corecore